Working with Cytometry Data in R

Mass Cytometry Course 2019

Benjamin Reisman

March 22, 2019

Why Use R?

“R is a free software environment for statistical computing and graphics”

Compared to Commercial Flow Cytometry Software, R has the following advantages:

Goals of this talk:

In R, things that look hard are easy, but things that look easy are (a little) hard.

Representing Data in R

Example: The iris dataset: measurements of 50 flowers of 3 species of iris

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

Representing Data in R: Data Frames

Data Frame: An \(n*m\) array of items, but each column can be a different class

## [1] "data.frame"
## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Representing Data in R: Matricies (1)

##      Sepal.Length Sepal.Width Petal.Length Petal.Width Species 
## [1,] "5.1"        "3.5"       "1.4"        "0.2"       "setosa"
## [2,] "4.9"        "3.0"       "1.4"        "0.2"       "setosa"
## [3,] "4.7"        "3.2"       "1.3"        "0.2"       "setosa"
## [4,] "4.6"        "3.1"       "1.5"        "0.2"       "setosa"
## [5,] "5.0"        "3.6"       "1.4"        "0.2"       "setosa"
## [6,] "5.4"        "3.9"       "1.7"        "0.4"       "setosa"
##  chr [1:150, 1:5] "5.1" "4.9" "4.7" "4.6" "5.0" "5.4" "4.6" "5.0" ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:5] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" ...

that doesn’t look right…

Representing Data in R: Matricies (2)

##      Sepal.Length Sepal.Width Petal.Length Petal.Width
## [1,]          5.1         3.5          1.4         0.2
## [2,]          4.9         3.0          1.4         0.2
## [3,]          4.7         3.2          1.3         0.2
## [4,]          4.6         3.1          1.5         0.2
## [5,]          5.0         3.6          1.4         0.2
## [6,]          5.4         3.9          1.7         0.4
##  num [1:150, 1:4] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

Defining Tidy Data

To work with data in R, it’s best to have ‘tidy data,’ which meets the following criteria:

  1. Each variable [‘feature’] must have its own column.
  2. Each observation [‘cell’] must have its own row.
  3. Each value must have its own cell [‘entry’].

…but cytometry data is not usually tidy.

For more information, see: Wickham, Hadley. “Tidy data.” Journal of Statistical Software 59.10 (2014): 1-23.

Representing Flow Cytometry Data in R

A number of specialized classes have been developed to represent high dimensional bioinformatics data:

Representing Flow Cytometry Data in R

A cytometry experiment may include:

  • FCS files
  • Compensations (FACS)
  • Transformations
  • Panels
  • Gates + Populations
  • Metadata

… but those aren’t neatly represented in R:

Traditional Object FlowCore Object R Equivalent
FCS File FlowFrame Matrix
Bunch of FCS File FlowSet List of matrices + pData
Gated Experiment Gatingset -

It’s easy to get flow cytometry data into R with the right tools

First we’ll need to load a few packages…

and find our files…

## [1] "KCL Guys Data/20180321-01 Group 1 Helios B Post-viSNE_Ungated.fcs.density.fcs.cluster.fcs"
## [2] "KCL Guys Data/20180321-02 Group 2 Helios A Post-viSNE_Ungated.fcs.density.fcs.cluster.fcs"
## [3] "KCL Guys Data/20180321-03 Group 3 Helios A Post-viSNE_Ungated.fcs.density.fcs.cluster.fcs"
## [4] "KCL Guys Data/20180321-04 Group 4 Helios B Post-viSNE_Ungated.fcs.density.fcs.cluster.fcs"

Cytotidyr and CytobankAPI can be used to work between Cytobank and R

Using CytobankAPI and Cytotidyr we’ll read in our experiment information from cytobank. This includes:

Reading in the Data

First we’ll read in the data as a flowSet

Then we’ll convert it to a gatingSet

## ....done!

Cytometry Preprocessing (Transformations, Gates, Panels) can be done in R

Next we’ll:

## B Cells
## T Cells
## CD4s
## CD8s
## ....done!

Cytotidyr allows us to convert the flowset to a tidy data.frame

In order to work with our data using R, we’ll need to convert it to a data frame, using the as.data.frame function from cytotidyr

## 'data.frame':    100000 obs. of  77 variables:
##  $ Time               : num  30.8 86.5 96.9 115.2 143.5 ...
##  $ Event_length       : num  28 15 15 15 15 14 23 14 22 17 ...
##  $ 89Y_CD45 (v)       : num  5.09 3.96 4.38 4.25 4.39 ...
##  $ 102Pd              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 103Rh_Viability (v): num  1.62 0 0 0 0 ...
##  $ 104Pd              : num  0.015 0 0 0 0 ...
##  $ 105Pd              : num  0 0 0 0.182 0 ...
##  $ 106Pd              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 108Pd              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 110Pd              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 113In              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 114Cd              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 115In              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 120Sn              : num  0 0 0 0 0.014 ...
##  $ 127I               : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 131Xe              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 133Cs              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 138Ba              : num  3.19 1.2 1.72 2.48 1.66 ...
##  $ 139La              : num  0 0 0 0 0 ...
##  $ 140Ce_EQ_Beads (v) : num  0 0.238 0 0 0 ...
##  $ 141Pr_CCR6 (v)     : num  0.345 0 0 0 0 ...
##  $ 142Nd              : num  0 0.268 0 0.188 0.376 ...
##  $ 143Nd              : num  0 0.766 0 0 0 ...
##  $ 144Nd              : num  0.0076 0.4031 0.3158 0.1317 0.0934 ...
##  $ 145Nd_CD4 (v)      : num  4.23749 0.65661 0.00594 3.82307 0.87242 ...
##  $ 146Nd_CD8 (v)      : num  2.964 5.773 2.319 0.535 5.619 ...
##  $ 147Sm_CD20 (v)     : num  0.1926 0 0 0 0.0314 ...
##  $ 148Nd_CD16 (v)     : num  0.0783 0.1981 3.3858 0 0.4623 ...
##  $ 149Sm_CCR4 (v)     : num  0 0 0 0.0259 0 ...
##  $ 150Nd              : num  0 0 0 0 0 ...
##  $ 151Eu              : num  0 0 0 0 0 ...
##  $ 152Sm              : num  1.4267 0 0 0.2121 0.0263 ...
##  $ 153Eu              : num  0 0 0 0 0 ...
##  $ 154Sm_CD3 (v)      : num  5.87 4.84 0 5.21 5.26 ...
##  $ 155Gd_CD45RA (v)   : num  3.23 3.3 3.01 3.37 4.37 ...
##  $ 156Gd              : num  0.00352 0 0 0 0 ...
##  $ 157Gd              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 158Gd              : num  0 0 0 0 0 ...
##  $ 159Tb_CCR7 (v)     : num  3.51 2.34 0 3.03 3.22 ...
##  $ 160Gd_CD14 (v)     : num  0 0 0 0 0.0252 ...
##  $ 161Dy              : num  0.303 0 0 0 0 ...
##  $ 162Dy              : num  0.0254 0.8437 0 0 0.8895 ...
##  $ 163Dy_CXCR3 (v)    : num  0 0.0603 0 0 3.2913 ...
##  $ 164Dy              : num  0 0 0.415 0 0 ...
##  $ 165Ho_CD45RO (v)   : num  0 0 0 0 0 ...
##  $ 166Er              : num  0.567 0 0 0 0 ...
##  $ 167Er_CD27 (v)     : num  3.15 3.09 0 2.9 3.72 ...
##  $ 168Er              : num  0.0626 0.0415 0 0 0.2607 ...
##  $ 169Tm_CD25 (v)     : num  0 0 0 0 0 ...
##  $ 170Er              : num  0 0 0 0.495 0 ...
##  $ 171Yb              : num  0 0 0 0 0 ...
##  $ 172Yb_CD57 (v)     : num  0 0 0 0 0 ...
##  $ 173Yb              : num  0 0 0 0 0 ...
##  $ 174Yb_HLA-DR (v)   : num  0 0 0.314 0 0.51 ...
##  $ 175Lu              : num  0 0 0 0.118 0 ...
##  $ 176Yb_CD127 (v)    : num  2.32 1.81 0 1.64 2.26 ...
##  $ 177Hf              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ 190BCKG            : num  0 0 0 0 0.152 ...
##  $ 191Ir_DNA (v)      : num  7.1 6.06 6.31 5.31 6.33 ...
##  $ 193Ir              : num  7.58 6.67 6.89 5.99 6.93 ...
##  $ 194Pt              : num  1.799 1.418 1.301 0.254 0.744 ...
##  $ 195Pt              : num  0 0 0 0.119 0 ...
##  $ 196Pt              : num  0 0 0.53 0 0 ...
##  $ 198Pt              : num  0.214 0 0.143 0 0 ...
##  $ 207Pb              : num  0 0 0 0 0 ...
##  $ 208Pb              : num  0.274 0.212 0 0.169 0 ...
##  $ 209Bi              : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Center             : num  6.18 5.37 5.48 5.48 5.32 ...
##  $ Offset             : num  4.45 3.53 3.74 3.66 3.54 ...
##  $ Width              : num  4.15 2.71 2.93 2.88 2.73 ...
##  $ Residual           : num  4.58 3.23 3.17 3.63 3.36 ...
##  $ tSNE1              : num  11.85 24.4 15.83 9.79 19.45 ...
##  $ tSNE2              : num  -5.23 -7.749 16.926 7.176 0.364 ...
##  $ density            : num  7.97 7.63 6.83 8.44 6.57 ...
##  $ cluster            : num  25 45 57 25 53 62 9 53 52 5 ...
##  $ FCS Filename       : chr  "20180321-04 Group 4 Helios B Post-viSNE_Ungated.fcs.density.fcs.cluster.fcs" "20180321-04 Group 4 Helios B Post-viSNE_Ungated.fcs.density.fcs.cluster.fcs" "20180321-04 Group 4 Helios B Post-viSNE_Ungated.fcs.density.fcs.cluster.fcs" "20180321-04 Group 4 Helios B Post-viSNE_Ungated.fcs.density.fcs.cluster.fcs" ...
##  $ Group              : chr  "Group 4" "Group 4" "Group 4" "Group 4" ...

Making Cytometry Figures in R (1)

One thing we may want to do is reproduce the same tSNE figure we made on cytobank:

Making Cytometry Figures in R (2)

We can also customize our plots in ways that are not easy to do in cytobank:

Making Cytometry Figures in R (3)

We may also want to plot multiple channels in the same plot with faceting:

## [1] 100000     77
## [1] 2100000      60

Making Cytometry Figures in R (4)

Then we’ll make our plot:

Applying alternative dimensionality reduction techniques (1)

One of the advantages of R is that we’re not limited to the dimensionality reduction techniques that are included in commercial packages.

Applying alternative dimensionality reduction techniques (2)

First we’ll need to create a seperate matrix containing the columns we want to be included in the dimensionality reduction.

Then we’ll run it through the uwot implementation of UMAP

##  num [1:100000, 1:2] -7.8976 1.4036 8.1545 -6.3052 0.0437 ...
##  - attr(*, "scaled:center")= num [1:2] -0.0042 0.061

Applying alternative dimensionality reduction techniques (3)

Next, we’ll rejoin the two new UMAP columns to our original dataframe, and make our plot:

Applying alternative dimensionality reduction techniques (4)

Applying alternative dimensionality reduction techniques (5)

We can also plot our data as a map:

Clustering in R

We can also apply a clustering algorithm to our dimensionality reduced data.

## DBSCAN clustering for 100000 objects.
## Parameters: eps = 0.3, minPts = 150
## The clustering contains 22 cluster(s) and 453 noise points.
## 
##     0     1     2     3     4     5     6     7     8     9    10    11 
##   453  2073  8279 37863  4422  6956  2559 15082  1102  1219  2836  2133 
##    12    13    14    15    16    17    18    19    20    21    22 
##  2693  6472   376  1731  1722   677   522   208   353   176    93 
## 
## Available fields: cluster, eps, minPts

Understanding our clusters

We have clusters, but how can we understand what makes them distinct?

What else is out there

CyTOFkit

  • Chen, H, et al., PLOS Computational Biology 2016
  • Integrated pipeline for analysing cytometry data in R

CytoRSuite

  • Set of interactive tools that integrate with flowWorkspace
  • Interactive gating, compensations, panels, etc…

Resources for Learning More